78 research outputs found

    Microdata protection through approximate microaggregation

    Get PDF
    Microdata protection is a hot topic in the field of Statistical Disclosure Control, which has gained special interest after the disclosure of 658000 queries by the America Online (AOL) search engine in August 2006. Many algorithms, methods and properties have been proposed to deal with microdata disclosure. One of the emerging concepts in microdata protection is k-anonymity, introduced by Samarati and Sweeney. k-anonymity provides a simple and efficient approach to protect private individual information and is gaining increasing popularity. k-anonymity requires that every record in the microdata table released be indistinguishably related to no fewer than k respondents. In this paper, we apply the concept of entropy to propose a distance metric to evaluate the amount of mutual information among records in microdata, and propose a method of constructing dependency tree to find the key attributes, which we then use to process approximate microaggregation. Further, we adopt this new microaggregation technique to study kk-anonymity problem, and an efficient algorithm is developed. Experimental results show that the proposed microaggregation technique is efficient and effective in the terms of running time and information loss

    Incremental DCOP Search Algorithms for Solving Dynamic DCOP Problems

    Get PDF
    Distributed constraint optimization problems (DCOPs) are wellsuited for modeling multi-agent coordination problems. However, most research has focused on developing algorithms for solving static DCOPs. In this paper, we model dynamic DCOPs as sequences of (static) DCOPs with changes from one DCOP to the next one in the sequence. We introduce the ReuseBounds procedure, which can be used by any-space ADOPT and any-space BnB-ADOPT to find cost-minimal solutions for all DCOPs in the sequence faster than by solving each DCOP individually. This procedure allows those agents that are guaranteed to remain unaffected by a change to reuse their lower and upper bounds from the previous DCOP when solving the next one in the sequence. Our experimental results show that the speedup gained from this procedure increases with the amount of memory the agents have available

    A Literature Survey and Classifications on Data Deanonymisation

    Get PDF
    The problem of disclosing private anonymous data has become increasingly serious particularly with the possibility of carrying out deanonymisation attacks on publishing data. The related work available in the literature is inadequate in terms of the number of techniques analysed, and is limited to certain contexts such as Online Social Networks. We survey a large number of state-of-the-art techniques of deanonymisation achieved in various methods and on different types of data. Our aim is to build a comprehensive understanding about the problem. For this survey, we propose a framework to guide a thorough analysis and classifications. We are interested in classifying deanonymisation approaches based on type and source of auxiliary information and on the structure of target datasets. Moreover, potential attacks, threats and some suggested assistive techniques are identified. This can inform the research in gaining an understanding of the deanonymisation problem and assist in the advancement of privacy protection

    Towards identify anonymization in large survey rating data

    Get PDF
    We study the challenge of identity protection in the large public survey rating data. Even though the survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. None of the existing anonymisation principles (e.g., k-anonymity, l-diversity, etc.) can effectively prevent such breaches in large survey rating data sets. In this paper, we tackle the problem by defining the (k, epsilon)-anonymity principle. The principle requires for each transaction t in the given survey rating data T, at least (k - 1) other transactions in T must have ratings similar with t, where the similarity is controlled by epsilon. We propose a greedy approach to anonymize survey rating data and apply the method to two real-life data sets to demonstrate their efficiency and practical utility

    Injecting purpose and trust into data anonymisation

    Get PDF
    Most existing works of data anonymisation target at the optimization of the anonymisation metrics to balance the data utility and privacy, whereas they ignore the effects of a requester's trust level and application purposes during the data anonymisation. Our aim of this paper is to propose a much finer level anonymisation scheme with regard to the data requester's trust value and specific application purpose. We prioritize the attributes for anonymisation based on how important and critical they are related to the specified application purposes and propose a trust evaluation strategy to quantify the data requester's reliability, and further build the projection between the trust value and the degree of data anonymiztion, which intends to determine to what extent the data should be anonymizd. The decomposition algorithm is developed to find the desired anonymous solution, which guarantees the uniqueness and correctnes

    On the complexity of restricted k-anonymity problem

    Get PDF
    One of the emerging concepts in microdata protection is k-anonymity, introduced by Samarati and Sweeney. k-anonymity provides a simple and efficient approach to protect private individual information and is gaining increasing popularity. k-anonymity requires that every tuple(record) in the microdata table released be indistinguishably related to no fewer than k respondents. In this paper, we introduce two new variants of the k-anonymity problem, namely, the Restricted k-anonymity problem and Restricted k-anonymity problem on attribute (where suppressing the entire attribute is allowed). We prove that both problems are NP-hard for k ≥ 3. The results imply the main results obtained by Meyerson and Williams. On the positive side, we develop a polynomial time algorithm for the Restricted 2-anonymity problem by giving a graphical representation of the microdata table

    Validating privacy requirements in large survey rating data

    Get PDF
    Recent study shows that supposedly anonymous movie rating records are de-identified by using a little auxiliary information. In this chapter, we study a problem of protecting privacy of individuals in large public survey rating data. Such rating data usually contains both ratings of sensitive and non-sensitive issues, and the ratings of sensitive issues belong to personal privacy. Even when survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. To amend this, in this chapter, we propose a novel (k;e; l)-anonymity model to protect privacy in large survey rating data, in which each survey record is required to be 'similar' with at least k−1 others based on the non-sensitive ratings, where the similarity is controlled by e, and the standard deviation of sensitive ratings is at least l. We study an interesting yet non-trivial satisfaction problem of the proposed model, which is to decide whether a survey rating data set satisfies the privacy requirements given by the user. For this problem, we investigate its inherent properties theoretically, and devise a novel slice technique to solve it. We discuss the idea of how to anonymize data by using the result of satisfaction problem. Finally, we conduct extensive experiments on two real-life data sets, and the results show that the slicing technique is fast and scalable with data size and much more efficient in terms of execution time and space overhead than the heuristic pairwise method

    Satisfying privacy requirements: one step before anonymization

    Get PDF
    In this paper, we study a problem of privacy protection in large survey rating data. The rating data usually contains both ratings of sensitive and non-sensitive issues, and the ratings of sensitive issues include personal information. Even when survey participants do not reveal any of their ratings, their survey records are potentially identifiable by using information from other public sources. We propose a new (k, , l)- anonymity model, in which each record is required to be similar with at least k−l others based on the non-sensitive ratings, where the similarity is controlled by , and the standard deviation of sensitive ratings is at least l. We study an interesting yet nontrivial satisfaction problem of the (k, , l)-anonymity, which is to decide whether a survey rating data set satisfies the privacy requirements given by users. We develop a slice technique for the satisfaction problem and the experimental results show that the slicing technique is fast, scalable and much more efficient in terms of execution time than the heuristic pairwise method

    Privacy preserving data sharing in data mining environment

    Get PDF
    Numerous organizations collect and distribute non-aggregate personal data for a variety of different purposes, including demographic and public health research. In these situations, the data distributor is often faced with a quandary: on one hand, it is important to protect the anonymity and personal information of individuals. While one the other hand, it is also important to preserve the utility of the data for research. This thesis presents an extensive study of this problem. We focus primarily on notions of anonymity that are defined with respect to individual identity, or with respect to the value of a sensitive attribute. We discuss the anonymization techniques over relational data and large survey rating data. For relational data, we propose a variety of techniques that use generalization (also called recoding) and microaggregation to produce a sanitized view, while preserving the utility of the input data. Specifically, we provide a new structure called 'Privacy Hash Table'; propose three enhanced privacy models to limit the privacy leakage; we inject the purpose and trust into the data anonymization process to increase the utility of the anonymized data, and we enhance the microaggregation method by using concepts from Information Theory. For survey rating data, we investigate two important problems (satisfaction and publication problems) in anonymizing survey rating data. By utilizing the characteristics of sparseness and high dimensionality, we develop a slicing technique for satisfaction problems. By using graphical representation, we provide a comprehensive analysis of graphical modification strategies. For all the techniques developed in this thesis, we include a set of extensive evaluations to indicate that the techniques are possible to distribute high-quality data that respect several meaningful notions of privacy
    corecore